-
Notifications
You must be signed in to change notification settings - Fork 10
NDS Speed Boost #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: nds
Are you sure you want to change the base?
NDS Speed Boost #24
Conversation
src/goddard/gd_math.c
Outdated
@@ -178,7 +178,7 @@ void gd_create_origin_lookat(Mat4f *mtx, struct GdVec3f *vec, f32 roll) { | |||
|
|||
gd_set_identity_mat4(mtx); | |||
if (hMag != 0.0f) { | |||
invertedHMag = 1.0f / hMag; | |||
invertedHMag = swiDivide(1.0f, hMag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swiDivide
doesn't support floats. Also, changes to the game code should be ifdef
ed to preserve the N64 build.
src/nds/nds_renderer.c
Outdated
@@ -417,7 +417,7 @@ static void g_vtx(Gwords *words) { | |||
const Vtx *vertices = (const Vtx*)words->w1; | |||
|
|||
// Store vertices in the vertex buffer | |||
memcpy(&vertex_buffer[index - count], vertices, count * sizeof(Vtx)); | |||
swiFastCopy(vertices, &vertex_buffer[index - count], sizeof(Vtx) * 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this replace count
with 4? That seems like a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi swiFastCopy is bugged and actually significantly slower than memcpy, if you want something faster try DMA or tonccpy or some other DS optimized memcpy
http://problemkaputt.de/gbatek-bios-memory-copy.htm
BUG: The NDS/DSi uses the fast 32-byte-block processing only for the first N bytes (not for the first N words), so only the first quarter of the memory block is FAST, the remaining three quarters are SLOWLY copied word-by-word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi swiFastCopy is bugged and actually significantly slower than memcpy, if you want something faster try DMA or tonccpy or some other DS optimized memcpy
http://problemkaputt.de/gbatek-bios-memory-copy.htm
BUG: The NDS/DSi uses the fast 32-byte-block processing only for the first N bytes (not for the first N words), so only the first quarter of the memory block is FAST, the remaining three quarters are SLOWLY copied word-by-word.
Thank you for the info. I tried DMA copy and found it caused graphical corruption. This is the first I have heard of tonccpy. Would you say that normal swiCopy would be good for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swiCopy is even worse than swiFastCopy, I did some testing and tonccpy seems to be more or less equal to memcpy (but VRAM safe), swiFastCopy is about 10% slower than memcpy, and swiCopy is about half the speed of memcpy. For whatever reason dmaCopy isn't cooperating with my testing so not sure exactly on it but I know it should be faster than memcpy.
You might need to flush the cache for DMA copy to work, since DMA is separate from the CPU it can't access the CPU's cache. I'm not sure how big if a speed penalty cache flushing has so CPU caching might end up faster in some cases because of that.
Edit: DMA wasn't cooperating because I was using no$gba it turns out, though my results still seem a bit weird on hardware... I'm getting DMA as like half the speed of memcpy which doesn't seem right... I'm just putting cpuStartTiming(0)
before and cpuEndTiming()
after doing a large copy, not sure if there's a better way to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to use dmaCopy but I can't seem to get it to work without corrupted graphics. I am flushing the cache.
I used
DC_FlushRange(&vertex_buffer[index - count], count * sizeof(Vtx));
dmaCopy(vertices, &vertex_buffer[index - count], count * sizeof(Vtx));
I have no idea what is wrong. I tried using
DC_FlushRange(vertices, count * sizeof(Vtx));
But that is even worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to flush the source, not the destination
DC_FlushRange(vertices, count * sizeof(Vtx));
edit: didn't see your whole message whoops, not sure why it's not working tbh 😅, i usually just use tonccpy since it's good enough and always works
I did not think about preserving the N64 build my apologies. As for replacing count with 4 it seems that swiFastCopy calculates size differently than memcpy. I will start adding the ifdefs in a bit. |
Thanks to Epicpkmn11 for letting me know of its existence.
Maybe you can fix ingame dialogs fonts too? It will be great! |
BTW the optimization flag -Ofast is better than -O3 in some cases (I have also tested it and as far as I can tell there is no downside to doing so) |
I made a few small changes to boost the speed a bit.
Used some NDS BIOS Math and Hardware accelerated Math where I found possible and optimized with O3 instead of O2.